Extended-Alphabet Finite-Context Models

نویسندگان

João M. Carvalho

Susana Brás

Diogo Pratas

Jacqueline Ferreira

Sandra C. Soares

Armando J. Pinho

چکیده

The Normalized Relative Compression (NRC) is a recent dissimilarity measure, related to the Kolmogorov Complexity. It has been successfully used in different applications, like DNA sequences, images or even ECG (electrocardiographic) signal. It uses a compressor that compresses a target string using exclusively the information contained in a reference string. One possible approach is to use finite-context models (FCMs) to represent the strings. A finite-context model calculates the probability distribution of the next symbol, given the previous k symbols. In this paper, we introduce a generalization of the FCMs, called extended-alphabet finite-context models (xaFCM), that calculates the probability of occurrence of the next d symbols, given the previous k symbols. We perform experiments on two different sample applications using the xaFCMs and the NRC measure: ECG biometric identification, using a publicly available database; estimation of the similarity between DNA sequences of two different, but related, species – chromosome by chromosome. In both applications, we compare the results against those obtained by the FCMs. The results show that the xaFCMs use less memory and computational time to achieve the same or, in some cases, even more accurate results. 1 ar X iv :1 70 9. 07 34 6v 2 [ cs .I T ] 1 5 M ar 2 01 8

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic chains with memory of variable length

Stochastic chains with memory of variable length constitute an interesting family of stochastic chains of infinite order on a finite alphabet. The idea is that for each past, only a finite suffix of the past, called context, is enough to predict the next symbol. These models were first introduced in the information theory literature by Rissanen (1983) as a universal tool to perform data compres...

متن کامل

Complexity of Problems for Commutative Grammars

We consider Parikh images of languages accepted by non-deterministic finite automata and context-free grammars; in other words, we treat the languages in a commutative way — we do not care about the order of letters in the accepted word, but rather how many times each one of them appears. In most cases we assume that the alphabet is of fixed size. We show tight complexity bounds for problems li...

متن کامل

Automata and Logics for Words and Trees over an Infinite Alphabet

In a data word or a data tree each position carries a label from a finite alphabet and a data value from some infinite domain. These models have been considered in the realm of semistructured data, timed automata and extended temporal logics. This paper survey several know results on automata and logics manipulating data words and data trees, the focus being on their relative expressive power a...

متن کامل

On Commutative Context-Free Languages

Let C = {a,, a2, . . . . a,} be an alphabet and let LcZ* be the commutative image of FP* where F and P are finite subsets of Z*. If, for any permutation c of { 1,2, . . . . n}, L n a&) a%, is context-free, then L is context-free. This theorem provides a solution to the Fliess conjecture in a restricted case. If the result could be extended to finite unions of the FP* above, the Fliess conjectur...

متن کامل

Zero Temperature Limits of Gibbs-Equilibrium States for Countable Alphabet Subshifts of Finite Type

Let A be a subshift of finite type on a countably infinite alphabet, and suppose that the function f : A → IR has summable variations. Further assumptions on f ensure it has a unique Gibbs-equilibrium state μf (see Section 2 for more details). The purpose of this article is to analyse the behaviour, as t →∞, of the Gibbs-equilibrium states μtf of tf . It will be shown that the family (μtf )t 1 ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1709.07346 شماره

صفحات -

تاریخ انتشار 2017

Extended-Alphabet Finite-Context Models

نویسندگان

چکیده

منابع مشابه

Stochastic chains with memory of variable length

Complexity of Problems for Commutative Grammars

Automata and Logics for Words and Trees over an Infinite Alphabet

On Commutative Context-Free Languages

Zero Temperature Limits of Gibbs-Equilibrium States for Countable Alphabet Subshifts of Finite Type

عنوان ژورنال:

اشتراک گذاری